The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.Source: http://disq.us/p/sv640d
ggplot2 is a huge package: philosophy + functionsEasy: install the tidyverse
install.packages('tidyverse')
Medium: install just ggplot2
install.packages('ggplot2')
Expert: install from GitHub (latest development version)
devtools::install_github('tidyverse/ggplot2')
library(tidyverse)
We’ll use an excerpt of the gapminder dataset provided by the gapminder package by Jenny Bryan.
# uncomment the next line to install {gapminder} package if not installed yet
# install.packages("gapminder")
library(gapminder)
Data to be visualized
Aesthetic mappings from data to visual component
Geometric objects that appear on the plot
Facets group into subplots
Coordinates organize location of geometric objects
Scales define the range of values for aesthetics
Statistics transform data on the way to visualization
Data
ggplot(data)
Tidy Data
Each variable forms a column
Each observation forms a row
Each observational unit forms a table
Key
What information do I want to use in my visualization?
Is that data contained in one column/row for a given data point?
Data
ggplot(data)
| country | 1997 | 2002 | 2007 |
|---|---|---|---|
| Canada | 30.30584 | 31.90227 | 33.39014 |
| China | 1230.07500 | 1280.40000 | 1318.68310 |
| United States | 272.91176 | 287.67553 | 301.13995 |
tidy_pop <- gather(messy_pop, 'year', 'pop', -country)
| country | year | pop |
|---|---|---|
| Canada | 1997 | 30.30584 |
| China | 1997 | 1230.07500 |
| United States | 1997 | 272.91176 |
| Canada | 2002 | 31.90227 |
| China | 2002 | 1280.40000 |
| United States | 2002 | 287.67553 |
| Canada | 2007 | 33.39014 |
| China | 2007 | 1318.68310 |
| United States | 2007 | 301.13995 |
Data
Aesthetic Mapping
+ aes()
Mapping
Map data to visual elements or parameters
year → x
pop → y
country → shape, color, etc.
aes(
x = year,
y = pop,
color = country
)
Data
Aesthetic Mapping
Geometric Objects
+ geom_*()
Geometric Objects
Geometric objects displayed on the plot
Here are the some of the most widely used geoms
| Type | Function |
|---|---|
| Point | geom_point() |
| Line | geom_line() |
| Bar | geom_bar(), geom_col() |
| Histogram | geom_histogram() |
| Regression | geom_smooth() |
| Boxplot | geom_boxplot() |
| Text | geom_text() |
| Vert./Horiz. Line | geom_{vh}line() |
| Count | geom_count() |
| Density | geom_density() |
https://eric.netlify.com/2017/08/10/most-popular-ggplot2-geoms/
See http://ggplot2.tidyverse.org/reference/ for many more options
## [1] "geom_abline" "geom_area" "geom_bar"
## [4] "geom_bin2d" "geom_blank" "geom_boxplot"
## [7] "geom_col" "geom_contour" "geom_count"
## [10] "geom_crossbar" "geom_curve" "geom_density"
## [13] "geom_density_2d" "geom_density2d" "geom_dotplot"
## [16] "geom_errorbar" "geom_errorbarh" "geom_freqpoly"
## [19] "geom_hex" "geom_histogram" "geom_hline"
## [22] "geom_jitter" "geom_label" "geom_line"
## [25] "geom_linerange" "geom_map" "geom_path"
## [28] "geom_point" "geom_pointrange" "geom_polygon"
## [31] "geom_qq" "geom_qq_line" "geom_quantile"
## [34] "geom_raster" "geom_rect" "geom_ribbon"
## [37] "geom_rug" "geom_segment" "geom_sf"
## [40] "geom_sf_label" "geom_sf_text" "geom_smooth"
## [43] "geom_spoke" "geom_step" "geom_text"
## [46] "geom_tile" "geom_violin" "geom_vline"
Or just start typing geom_ in RStudio
ggplot(tidy_pop)
ggplot(tidy_pop) +
aes(x = year, #<<
y = pop) #<<
ggplot(tidy_pop) +
aes(x = year,
y = pop) +
geom_point() #<<
ggplot(tidy_pop) +
aes(x = year,
y = pop,
color = country) + #<<
geom_point()
ggplot(tidy_pop) +
aes(x = year,
y = pop,
color = country) +
geom_point() +
geom_line() #<<
geom_path: Each group consists
of only one observation.
Do you need to adjust the
group aesthetic?
ggplot(tidy_pop) +
aes(x = year,
y = pop,
color = country) +
geom_point() +
geom_line(
aes(group = country)) #<<
g <- ggplot(tidy_pop) +
aes(x = year,
y = pop,
color = country) +
geom_point() +
geom_line(
aes(group = country))
g
Data
Aesthetic Mapping
Geometric Objects
+ geom_*()
Geometric Objects
geom_*(mapping, data, stat, position)
data Geoms can have their own data
map Geoms can have their own aesthetics
geom_point needs x and y, optional shape, color, size, etc.geom_ribbon requires x, ymin and ymax, optional fill?geom_ribbongeom_*(mapping, data, stat, position)
stat Some geoms apply further transformations to the data
stat = 'identity'geom_histogram uses stat_bin() to group observationsposition Some adjust location of objects
'dodge', 'stack', 'jitter' Data
Aesthetic Mapping
Geometric Objects
Facets
+facet_wrap()
+facet_grid()
Facets
g + facet_wrap(~ country)
g + facet_grid(continent ~ country)
Data
Aesthetic Mapping
Geometric Objects
Facets
Coordinates
+ coord_*()
Coordinates
g + coord_flip()
g + coord_polar()
Data
Aesthetic Mapping
Geometric Objects
Facets
Coordinates
Scales
+ scale_*_*()
Scales scale + _ + <aes> + _ + <type> + ()
What parameter do you want to adjust? → <aes>
What type is the parameter? → <type>
scale_x_discrete()scale_size_continuous()scale_y_log10()scale_fill_discrete()scale_color_manual()g + scale_color_manual(values = c("peru", "pink", "plum"))
g + scale_y_log10()
g + scale_x_discrete(labels = c("MCMXCVII", "MMII", "MMVII"))
Data
Aesthetic Mapping
Geometric Objects
Facets
Coordinates
Scales
Statistics
stat_count()
stat_identity()
Statistics stat_count() is not used/called explicitly, and typically used in conjuction with geom_*() that visualize counts - geom_histogram(), geom_bar(), geom_col().
ggplot(gapminder, aes(gdpPercap)) +
geom_histogram(aes(y = stat(count)))
Note
geom_bar() uses stat_count() by default: it counts the number of cases at each x position.
geom_col() uses stat_identity(): it leaves the data as is.
Data
Aesthetic Mapping
Geometric Objects
Facets
Coordinates
Scales
Statistics
Labels
+ labs()
Labels
g + labs(x = "Year", y = "Population")
Data
Aesthetic Mapping
Geometric Objects
Facets
Coordinates
Scales
Statistics
Labels
Themes
+ theme()
Themes Change the appearance of plot decorations
i.e. things that aren’t mapped to data
A few “starter” themes ship with the package
g + theme_bw()g + theme_dark()g + theme_gray()g + theme_light()g + theme_minimal()Huge number of parameters, grouped by plot area:
line, rect, text, titleaxis: x-, y- or other axis title, ticks, lineslegend: Plot legendspanel: Actual plot areaplot: Whole imagestrip: Facet labelsTheme options are supported by helper functions:
element_blank() removes the elementelement_line()element_rect()element_text()g + theme_bw()
g + theme_minimal() + theme(text = element_text(family = "sans"))
You can also set the theme globally with theme_set()
All plots will now use this theme!
my_theme <- theme_bw() +
theme(
text = element_text(family = "sans", size = 12),
panel.border = element_rect(colour = 'grey80'),
panel.grid.minor = element_blank()
)
theme_set(my_theme)
g
You may also alter certain aspects of the plot, in addition to the defaults set in theme_set(); in this case, the legend is moved to the bottom.
g + theme(legend.position = 'bottom')
To save your plot, use ggsave
ggsave(
filename = "my_plot.png",
plot = my_plot,
width = 10,
height = 8,
dpi = 100,
device = "png"
)
library(gapminder)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
Observations: 1,704
Variables: 6
$ country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia...
$ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
$ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.8...
$ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 78...
Let’s start with lifeExp vs gdpPercap
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp)
Add points…
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp) +
geom_point() #<<
How can I tell countries apart?
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp,
color = continent) + #<<
geom_point()
GDP is squished together on the left
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp,
color = continent) +
geom_point() +
scale_x_log10() #<<
Still lots of overlap in the countries…
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp,
color = continent) +
geom_point() +
scale_x_log10() +
facet_wrap(~ continent) + #<<
guides(color = FALSE) #<<
No need for color legend thanks to facet titles
Lots of overplotting due to point size
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp,
color = continent) +
geom_point(size = 0.25) + #<<
scale_x_log10() +
facet_wrap(~ continent) +
guides(color = FALSE)
Is there a trend?
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp,
color = continent) +
geom_line() + #<<
geom_point(size = 0.25) +
scale_x_log10() +
facet_wrap(~ continent) +
guides(color = FALSE)
Okay, that line just connected all of the points sequentially…
ggplot(gapminder) +
aes(x = gdpPercap,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country) #<<
) +
geom_point(size = 0.25) +
scale_x_log10() +
facet_wrap(~ continent) +
guides(color = FALSE)
We need time on x-axis!
ggplot(gapminder) +
aes(x = year, #<<
y = gdpPercap, #<<
color = continent) +
geom_line(
aes(group = country)
) +
geom_point(size = 0.25) +
scale_y_log10() + #<<
facet_wrap(~ continent) +
guides(color = FALSE)
Can’t see x-axis labels, though
ggplot(gapminder) +
aes(x = year,
y = gdpPercap,
color = continent) +
geom_line(
aes(group = country)
) +
geom_point(size = 0.25) +
scale_y_log10() +
scale_x_continuous(breaks = #<<
seq(1950, 2000, 25) #<<
) + #<<
facet_wrap(~ continent) +
guides(color = FALSE)
What about life expectancy?
ggplot(gapminder) +
aes(x = year,
y = lifeExp, #<<
color = continent) +
geom_line(
aes(group = country)
) +
geom_point(size = 0.25) +
#scale_y_log10() + #<<
scale_x_continuous(breaks =
seq(1950, 2000, 25)
) +
facet_wrap(~ continent) +
guides(color = FALSE)
Okay, let’s add a trend line
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country)
) +
geom_point(size = 0.25) +
geom_smooth() + #<<
scale_x_continuous(breaks =
seq(1950, 2000, 25)
) +
facet_wrap(~ continent) +
guides(color = FALSE)
De-emphasize individual countries
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country),
color = "grey75" #<<
) +
geom_point(size = 0.25) +
geom_smooth() +
scale_x_continuous(breaks =
seq(1950, 2000, 25)
) +
facet_wrap(~ continent) +
guides(color = FALSE)
Points are still in the way
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country),
color = "grey75"
) +
#geom_point(size = 0.25) + #<<
geom_smooth() +
scale_x_continuous(breaks =
seq(1950, 2000, 25)
) +
facet_wrap(~ continent) +
guides(color = FALSE)
Let’s compare continents
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country),
color = "grey75"
) +
geom_smooth() +
# scale_x_continuous(
# breaks =
# seq(1950, 2000, 25)
# ) +
# facet_wrap(~ continent) + #<<
guides(color = FALSE)
Wait, what color is each continent?
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country),
color = "grey75"
) +
geom_smooth() +
theme( #<<
legend.position = "bottom" #<<
) #<<
Let’s try the minimal theme
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country),
color = "grey75"
) +
geom_smooth() +
theme_minimal() + #<<
theme(
legend.position = "bottom"
)
Fonts are kind of big
ggplot(gapminder) +
aes(x = year,
y = lifeExp,
color = continent) +
geom_line(
aes(group = country),
color = "grey75"
) +
geom_smooth() +
theme_minimal(
base_size = 8) + #<<
theme(
legend.position = "bottom"
)
Cool, let’s switch gears
americas <-
gapminder %>%
filter(
country %in% c(
"United States",
"Canada",
"Mexico",
"Ecuador"
)
)
Let’s look at four countries in more detail. How do their populations compare to each other?
## # A tibble: 48 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Canada Americas 1952 68.8 14785584 11367.
## 2 Canada Americas 1957 70.0 17010154 12490.
## 3 Canada Americas 1962 71.3 18985849 13462.
## 4 Canada Americas 1967 72.1 20819767 16077.
## 5 Canada Americas 1972 72.9 22284500 18971.
## 6 Canada Americas 1977 74.2 23796400 22091.
## 7 Canada Americas 1982 75.8 25201900 22899.
## 8 Canada Americas 1987 76.9 26549700 26627.
## 9 Canada Americas 1992 78.0 28523502 26343.
## 10 Canada Americas 1997 78.6 30305843 28955.
## # ... with 38 more rows
ggplot(americas) +
aes(
x = year,
y = pop
) +
geom_col()
Let’s look at four countries in more detail. How do their populations compare to each other?
Yeah, but how many people are in each country?
ggplot(americas) +
aes(
x = year,
y = pop,
fill = country #<<
) +
geom_col()
Bars are “stacked”, can we separate?
ggplot(americas) +
aes(
x = year,
y = pop,
fill = country
) +
geom_col(
position = "dodge" #<<
)
position = "dodge" places objects next to each other instead of overlapping
What is scientific notation anyway?
ggplot(americas) +
aes(
x = year,
y = pop / 10^6, #<<
fill = country
) +
geom_col(
position = "dodge"
)
ggplot aesthetics can take expressions!
Might be easier to see countries individually
ggplot(americas) +
aes(
x = year,
y = pop / 10^6,
fill = country
) +
geom_col(
position = "dodge"
) +
facet_wrap(~ country) + #<<
guides(fill = FALSE) #<<
Let range of y-axis vary in each plot
ggplot(americas) +
aes(
x = year,
y = pop / 10^6,
fill = country
) +
geom_col(
position = "dodge"
) +
facet_wrap(~ country,
scales = "free_y") + #<<
guides(fill = FALSE)
What about life expectancy again?
ggplot(americas) +
aes(
x = year,
y = lifeExp, #<<
fill = country
) +
geom_col(
position = "dodge"
) +
facet_wrap(~ country,
scales = "free_y") +
guides(fill = FALSE)
This should really be 📈…instead of 📊
ggplot(americas) +
aes(
x = year,
y = lifeExp,
fill = country
) +
geom_line() + #<<
facet_wrap(~ country,
scales = "free_y") +
guides(fill = FALSE)
📊 are filled 📈 are colored
ggplot(americas) +
aes(
x = year,
y = lifeExp,
color = country #<<
) +
geom_line() +
facet_wrap(~ country,
scales = "free_y") +
guides(color = FALSE) #<<
Altogether now!
ggplot(americas) +
aes(
x = year,
y = lifeExp,
color = country
) +
geom_line()
Okay, changing gears again. What is range of life expectancy in Americas?
gapminder %>%
filter(
continent == "Americas"
) %>% #<<
ggplot() + #<<
aes(
x = year,
y = lifeExp
)
You can pipe into ggplot()!
Just watch for %>% changing to +
Boxplot for life expectancy range
gapminder %>%
filter(
continent == "Americas"
) %>%
ggplot() +
aes(
x = year,
y = lifeExp
) +
geom_boxplot() #<<
Why not boxplots by year?
gapminder %>%
filter(
continent == "Americas"
) %>%
mutate( #<<
year = factor(year) #<<
) %>% #<<
ggplot() +
aes(
x = year,
y = lifeExp
) +
geom_boxplot()
OK, what about global life expectancy?
gapminder %>%
# filter(
# continent == "Americas"
# ) %>%
mutate(
year = factor(year)
) %>%
ggplot() +
aes(
x = year,
y = lifeExp
) +
geom_boxplot()
Can we have cute little boxplots for each continent?
gapminder %>%
mutate(
year = factor(year)
) %>%
ggplot() +
aes(
x = year,
y = lifeExp,
fill = continent #<<
) +
geom_boxplot()
Hard to read years, let’s rotate
gapminder %>%
mutate(
year = factor(year)
) %>%
ggplot() +
aes(
x = year,
y = lifeExp,
fill = continent
) +
geom_boxplot() +
coord_flip() #<<
Use dplyr::mutate() to group by decade
gapminder %>%
mutate(
decade = floor(year / 10), #<<
decade = decade * 10, #<<
decade = factor(decade) #<<
) %>%
ggplot() +
aes(
x = decade, #<<
y = lifeExp,
fill = continent
) +
geom_boxplot() +
coord_flip()
Let’s hide Oceania…
g <- gapminder %>%
filter( #<<
continent != "Oceania" #<<
) %>% #<<
mutate(
decade = floor(year / 10) * 10, decade = factor(decade)
) %>%
ggplot() +
aes(
x = decade,
y = lifeExp,
fill = continent
) +
geom_boxplot() +
coord_flip()
Labeling the plot
g +
theme_minimal(8) +
labs(
y = "Life Expectancy",
x = "Decade",
fill = NULL,
title = "Life Expectancy by Continent and Decade",
caption = "gapminder.org"
)
Note x and y are original aesthetics, coord_flip() happens after.
Remove legend labels by setting = NULL.
ggplot2 docs: http://ggplot2.tidyverse.org/
R4DS - Data visualization: http://r4ds.had.co.nz/data-visualisation.html
Hadley Wickham’s ggplot2 book: https://www.amazon.com/dp/0387981403/
esquisse: Interactively build ggplot2 plots
ggplotThemeAssist: Customize your ggplot theme interactively
ggedit: Layer, scale, and theme editing
fivethirtyeight
nycflights
ggplot2movies